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ON RELIABILITY FUNCTION OF BSC 
WITH NOISY FEEDBACK 



For information transmission a binary symmetric channel is used. There 
is also another noisy binary symmetric channel (feedback channel), and the 
transmitter observes without delay all the outputs of the forward channel via 
that feedback channel. The transmission of an exponential number of messages 
(i.e. the transmission rate is positive) is considered. The achievable decoding error 
exponent for such a combination of channels is investigated. It is shown that if the 
crossover probability of the feedback channel is less than a certain positive value, 
then the achievable error exponent is better than the decoding error exponent of 
the channel without feedback. 



§ 1. Introduction and main results 

The binary symmetric channel BSC(p) with crossover probability < p < 1/2 (and 
q = 1 — p) is considered. It is assumed that there is also the feedback BSC(pi) channel, and 
the transmitter observes (without delay) all outputs of the forward BSC(p) channel via that 
noisy feedback channel. No coding is used in the feedback channel (i.e. the receiver simply 
resends to the transmitter all received outputs). In other words, the feedback channel is 
"passive" (see Fig. 1). 
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Fig. 1. Channel model 

We consider the case when the overall transmission time n and M = e Rn equiprobable 
messages {8\, . . . , 8m} are given. After the moment n, the receiver makes a decision 9 on the 
message transmitted. We are interested in the best possible decoding error exponent (and 
whether it can exceed the similar exponent of the channel without feedback). 

lr rhe research described in this publication was made possible in part by the Russian Fund for 
Fundamental Research (project numbers 06-01-00226 and 09-01-00536). 
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Such model was considered in [lj, where the case of a nonexponential (on n) number 
M (i.e. R = 0) was investigated. In the paper we consider the case M = e Rn , R > 0, 
strengthening methods of pQ. The main difference is that since now M is exponential in n, 
we will need much more accurate investigation of the decoding error probability. Moreover, 
if M is nonexponential in n, then we know the best code for use during phase I - it is an 
"almost equidistant" code (i.e. all its codeword distances equal n/2 + o(n)). If R > then 
we do not know such best code, and for that reason we choose that code randomly. 

Some results for channels with noiseless feedback can be found in [2-12], and in the noisy 
feedback case - in [T3| IT4"] (see also discussion in |TJ). 

We show that if the crossover probability p\ of the feedback channel BSC(pi) is less then 
the certain positive value p (p,R), then it is possible to improve the best error exponent 
E(R, p) of BSC(p) without feedback. The transmission method with one "switching" moment, 
giving such an improvement, is described in §4. It is similar to the method used in [TJ. 

We will need some definitions and notations. For L = 1,2,... define the critical rates 

RcritM > RcntM > ... [mna 



pl/(L+l) + q l/(L+l) 



(1) 



where h(x) = — xhxx — (1 — x) ln(l — x). For L = 1 we omit the index L and simply write 
Rcntip) = RcTit,i(p), E(R,p) = E(R,p, 1), etc. 

Define the new critical rate R2 = -R2G ) as the unique root of the equation [T7] 

a(l - a) - r(l - r) y/pq 

mm 



o<r<a<i/2 l + 2Jt{1 -t) 1 + 2Jpq 

h{a)-h(T)=\n2-R 2 V V ' 

Then < R 2 (p) < R CTit (p), 0<p< 1/2. 

Denote by C(p) = ln2 — h(p) the capacity of the BSC(p), and by E sp (R,p) the sphere- 
packing exponent 

E sp (R,p) = D(5 GV (R)\\p), 
D(x II y) = xln — + (1 — x) ln , 

y 1-2/ 

where 5qv{R) < 1/2 is defined by the relation 

hx2 - R = h{5 GV {R)). 

Denote by E(R,p) the best decoding error exponent (the reliability function) of BSC(p) 
without feedback. For Ri{p) < R <C(p), and R = the function E(R,p) is known exactly 
|E1: 

1 ,P) ~ A ,P) ~ I E sp (R,p), R CTit (p) <R< C(p), 

E(0,p)=E cx (0,p) = \\n^ 
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where E r (R, p) , E ex (R, p) - "random coding" bounds |6j [T5J [16] (see §6). 

For < R < Rq,(j>) there are known only lower and upper bounds for the function 
E(R,p). To describe the best known lower bound (the exponent E ex (R,p) of random coding 
with "expurgation"), introduce the rate -R m in(p) (see (I4U1) ). Then < -R m in(p) < -R2G ) < 
-Rcrit(p), < p < 1/2, and the best known lower bound [HI [16] has the form 

E{R,p) > E cx {R,p) - j ln2 _ ln(l + _ Rt RmM < R< Mp) _ (3) 

Denote by E(R,p,L) the best list size L decoding error exponent of BSC(p) without 
feedback. It is known that E(R,p,L) = E T (R,p,L) = E Bp (R,p), R cr it,L(p) < R < C(p) 
P G3J GE] and £7(0, p, L) = £7 ex (0,p, L) [18], where the "random coding" E T (R,p, L) and the 
"random coding with expurgation" E ex (R,p, L) bounds are described in §6. 

For < R < -Rcrit,i(p) the best known lower bound for E(R,p, L) has the form [T5| [T6] 



E(R,p,L) > E ex (R,p,L), 0<R< R CTit , L (p). (4) 

We also have E ex (R,p,L) = E T (R,p,L), -R m in,L(p) < R < R C rit,L(p) (see (I42|) V Denote 

Eiaw(R,p, L) =m&x{E r (R,p,L),E ex (R,p,L)}. (5) 

Denote by E(R,p) the best decoding error exponent of BSC(p) with noiseless feedback. 
Then 

E(R,p) = F(R,p)=E sp (R,p), R clit (p)<R<C(p) [3], 

E(R,p)<F(R,p)<E sp (R,p), 0<R<R CTit (p) [3], 

E{0, p) < F(0, p) = - In (pV3 g 2/3 + p 2/3 q l/3j [5] _ 

Denote by F(R,p,p\) the best decoding error exponent of BSC(p) with the noisy BSC(pi) 
feedback channel. Clearly, E(R,p) < F(R,p,pi) < F(R,p) for all p,pi. In particular, 
F(R,p,0) = F(R,p), F(R,p, 1/2) = E(R,p). 

Denote by E 2 (p) the best error exponent for two codewords over BSC(p) (clearly, it 
remains the same for the channel with noiseless feedback as well) 

*bO = iln^. (6) 

Denote by Fi(R,p,p\) the decoding error exponent of the transmission method described 
in §4 (with one switching moment). The inequality Fi(R,p,pi) > E(R,p) is possible only 
when R < -R CT it(p)- 

To describe the function p (R,p) of the critical noise level in the feedback channel, 
introduce the function 

+ m v 3[E low (R,p,2)-E low (R,p)] 

to(jR ' p) = H^Jp) ' (7) 
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where E\ ow (R,p,2), E\ ow (R,p) = E\ ow (R,p, 1) are defined in fl5j). 

The function to(R,p) monotonically decreases on R. For a given R > it first increases 
on p, and then decreases. Moreover, 

max to p) = maxt (0,p) « t (0, 0.0124) « 0.1322. 

Introduce the function p$ = po(R,p) < to(R,p) as the unique root of the equation 

D(t (R,p)\\po)=2R. (8) 

In particular, 

, N , N 3 Hn4-31n(p 1 / 3 + g 1 /3)l 

«.O.P)=*.O.rt=- t 41nfa/ P ) • 

Define also t\ = t\(R,pi) > p\ as the unique root of the equation 

D(t 1 \\ Pl ) = 2R. (9) 



The main result of the paper represents 

Theorem 1. If R < i? C rit(p) and pi < po(R,p), then 



tpijd w tvd \ ^ / E ex (R,p), 0<R<R 2 (p), hn x 

F 1 (R,p, Pl ) > rn^T(R,p,p in ) > j E{R ^ ^ ^ R< (10) 



where 



T = min { 7 £iowOR/7,P,2) - ^ R ^ Pl) In g jE low (R/l,P) + (1 - 7)^(p)V (H) 

In other words, for any i? < i? C rit(p) and pi < po(R,p) the function Fi(R,p,pi) is bigger 
(i.e. better) than the best known lower bound for the decoding error exponent of BSC(p) 
without feedback. 

Moreover, there exists the positive function p 2 (R,p) such that the following result holds. 
Corollary 1. If R < R CT it(p) and p\ < p 2 (R,p), then 

Fi(R,P,Pi) > max .T(R,p, pi, 7) > E(R,p). (12) 

0<7<1 v ' 

This result follows from the proof of the Theorem 2 (see §3) and the fact that the function 
T(R,p,pi,j) is continuous on pi. 

Remark 1. We do not try to find the best function po(R,p), limiting ourselves to rather 
simple estimates for it. 

On Fig. 2. the plot of the function p (R,p) for p = 0.01 is given (-R cr ; t ~ 0.387). Note 
that here p (R,p) > p for small R. 

It is more convenient for us to investigate first the function F\(R,p,pi) for p\ = 0, i.e. 
for the channel with noiseless feedback. Then the next result holds. 
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Theorem 2. J/0 < p < 1/2, R< R cri t(p), then 

F 1 (R,p,0) = Fx(i2,p) > 7o£iowOR/7o,P,2) > E(R,p), (13) 

where 70 G (R/R cri t{p), 1) «s i/ie largest root of the equation ( 1201) . 

Remark 2. If ^! — )■ 0, then the relations (1T0|) . (TTTj) turn into the similar relation (TLB"]) for 
the channel with noiseless feedback (see also remark 6 in §4). 

Remark 3. The transmission method described in § 4, reduces the problem to testing of 
two most probable (at a fixed moment) messages. Such strategy is not optimal even for one 
switching moment (at least, if p\ is very small). But it is relatively simple for investigation, 
and it gives already a reasonable improvement over the channel without feedback. 

Remark^. In the preliminary publication [T9l Proposition] it was claimed that po(R,p) = 
1/2 for some range of rates R. In the proof of that result a miscalculation was found. 

Below in § 2 informal description of the transmission method is given. In § 3 the 
transmission method with one switching moment in the case of the channel with noiseless 
feedback is described and analyzed and the Theorem 2 is proved. In § 4 that method (slightly 
modified) is investigated for the channel with noisy feedback and the Theorem 1 is proved. 
In § 5 it is clarified for which pi noisy feedback behaves approximately like noiseless. A part 
of formulas used and some auxiliary results are presented in § 6. 

A preliminary (and simplified) paper variant (without detailed proofs) was published in 

§ 2. Informal description of the transmission method 

We use the transmission method with one fixed switching moment at which the coding 
function is changed. That method is based on one idea and one useful observation. 
Idea . It is based on the inequality which follows from (T4T]) 

E ex (R,p) < E low (R,p, 2), R < R cvit (p). (14) 

Considering only R < R cri t{p) we choose some positive 7 < 1 and partition the total 
transmission period [1, n] on two phases: [1,771] (phase I) and (771,71] (phase II) (at first 
we may think that 7 is rather close to one). 

On phase I (i.e. on [0, jn]) we use the "best" code of M codewords {xi} of length 7n (see 
below). On that phase the transmitter only observes (via the feedback channel) outputs of 
the forward channel, but does not change the coding function. We set the value 7 = 7(i?,p) 
such that 

E cx (R,p) < 1 E^{R/ 1 ,p, 2) , R < R CTit (p) (15) 

(it is always possible due to continuity of the function / yEi ow (R/'j,p, 2) on 7 and the condition 
(Il4p ). After phase I (at moment / yn) the receiver selects two most probable messages 9i,9j. 
By the condition (1151) . the exponent of the probability that the true message 9 tTUC is not 
among the chosen messages 9i,9j, will be larger (i.e. better) than E ex (R,p). Assume that 
by some means the transmitter is also able to recover those two most probable messages 
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9i,9j (it is certainly so in the noiseless feedback case). Then, on phase II (i.e. on (771, n]) 
the transmitter only helps the receiver to decide between those two most probable messages 
9i,9j, using two opposite codewords of length (1 — 7)72. The error exponent E^ijp) (see (jHJ)) 
on that phase is better than all other exponents involved. As a result, it gives the overall 
decoding error exponent better than E ex (R,p). 

It remains us to find the way the transmitter will able to recover those two most probable 
messages 9^9 j. It may seem that it is always possible if the value p\ is sufficiently small. 
But it is not true. With high probability (even close to one) the second 9j and the third 9}. 
most probable messages will be approximately equiprobable, and then, for any p\ > 0, the 
transmitter will not be able to rank them correctly (due to noise in the feedback channel). 

Observation . Fortunately, in that case (with high probability) the most probable 
message 9\ will be much more probable than the second most probable message 9j. In such 
case the receiver makes a decision immediately after phase I (in favor of the most probable 
message 9j), and it ignores all next signals from the transmitter. 

The description given is rather intuitive, and it should be checked analytically (which is 
done below). 

§ 3. Channel with noiseless feedback. Proof of Theorem 2 

For simplicity, we start with the noiseless feedback case and describe formally the 
transmission method which (after some modification) will be used for noisy feedback as well. 
Moreover, in the noisy feedback case we will need some formulas from the noiseless feedback 
case. 

Denote by F\(R,p) = Fi(R,p, 0) the decoding error exponent of the transmission method 
described below (with one switching moment). 

Proof of Theorem 2. We consider M = e Rn messages 9i, . . . , 9m- Using some 
7 G [0,1] (it will be chosen later), we partition the total transmission time [l,n] on two 
phases: [1,771] (phase I) and (777,71] (phase II). We perform as follows. 

1) On phase I (i.e. on [1,771]) we use the "best" code of M codewords {x,i} of length 771 
(see below). On that phase the transmitter only observes (via the feedback channel) outputs 
of the forward channel, but does not change the coding function. 

2) Let x be the transmitted codeword (of length 777) and y be the received (by the 
receiver) block. After phase I, based on the block y, the transmitter selects two messages 
9{,9j (codewords Xi,Xj) which are the most probable for the receiver, and ignores all the 
remaining messages {9k}. If among the selected messages 9{, 9j there is the true message 9 trn e, 
then on phase II (i.e. on (777,77]) the transmitter only helps the receiver to decide between 
those two most probable messages 9{,9j, using two opposite codewords of length (1 — 7)77. 
If the true message 9 tmc is not among those two selected messages, then the transmitter 
sends an arbitrary block. After moment n the receiver makes a decision in favor of the most 
probable of those two remaining messages 6*j, 9j (based on all received on [1, 77] signals). 

Clearly, a decoding error occurs in the following two cases. 

1) After phase I the true message is not among two most probable messages. We denote 
that probability by Pi. 
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2) After phase I the true message is among two most probable, but after phase II it is 
not the most probable. We denote that probability by P20. 
Then for the total decoding error probability P we have 

Pc<Pl + P20- (16) 

On phase I (of length 771) we use a code having small two decoding error probabilities: 
usual and when decoding with list size L = 2. Then there exists a code such that for Pi we 
have (see § 6) 

-In— > 1 E low (R/ 1 ,p,2) + o(l), n^oo. (17) 
n F\ 

Now we evaluate the probability P20 Denote by d (x, y) the Hamming distance between 
x and y, and d^ = d (xi, Xj). On phase I (of length 771) the distances among codewords are 
{dij}. On phase II (of length (1 — 7)71) the distance between two remaining codewords equals 
(1 — 7)71. Therefore, the total distance between the true and the concurrent codewords equals 
d^ + (1 — 7)72. Then there exists a code such that (see derivation in § 6) 

I i n _L > 7 £ low (P/ 7 ,p) + (1 _ j)E 2 (p) + o(l). (18) 
n P 20 

Moreover, there exists a code for which both relations (ITTj) and (JTHJ) are fulfilled (see §6). 
Then from (fT6|) - f|T8|) we have 

1,1 1 f, 1 , 1 1 2 

— In — > — mm < In — , In > > 

n P e n { Pi P 20 J n 

> min{ 7 P low (P/7,p, 2), 7 P low (P/ 7 ,p) + (1 - 1 )E 2 (p)} + o(l) , 
where p2(p) is defined in (Q. Therefore 

Fi(R,p) > max min{7P low (P/7,p,2),7Pi ow (P/7,p) + (1 - j)E 2 (p)} , f 19 ) 

0<7<1 v ' 

where E\ ow (R,p,2) and Ey ow (R,p) are defined in §5§ (see also §6). 

Note that the function / ~fEi ow (R/ / j,p,2) from the right-hand side of ( JT9|) monotonically 
increases in 7. On the contrary, the function S(j, R,p) = 7Pi ow (P/7,p) + (1 — 7)p2(p) 
monotonically decreases in 7. Indeed, denoting r = P/7 and omitting p, we have ^(7, R) = 
Piow( r ) — r E'iow( r ) ~~ E 2 and S''^, R) = —rE" ow (r) < 0. Therefore maximum over R, 7 
of the value S' (jyR) is attained when r — y 0. Since rP( ow (r) — ^ 0, r — 0, then we get 
max S , ;( 7 , R) = E low (0) - E 2 < 0. 

We consider only the case R < P CT it(p), i-e. when E\ ow (R,p,2) > E\ ow (R,p). For such R 
the best is to set 7 = 70 such that Pi = P20, i.e. 

7o Piow(P/7o,P, 2) = 7o£iow(P/7o,p) + (1 - 7o)£ 2 (p) . (20) 
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Both sides of (120|) are continuous functions in 70. The left-hand side of ( 120]) monotonically 
increases in 70, and the right-hand one monotonically decreases in 70 . With 70 = 1 the left- 
hand side is greater than its right-hand side, which equals E\ ow (R,p). On the contrary, for 
70 = R/Rcrit the right-hand side is greater than the left-hand side. Then there exists the 
unique 70 G (R/R cr i t , 1) satisfying ( 1201 . Therefore we get 

Ft(R,p) > j E law (R/j ,p) + (1 - 7o)£ 2 (p) > E l0W (R,p). (21) 

We show that, in fact, Fi(R,p) satisfies the stronger inequality (fTBl) . although we know 
exactly only part of the function E(R,p), < R < R c ?\t{p) (see ([2])). If we connect the points 
E(0,p) and E(R CTit ,p) by the piece of the straight line, then due to the "straight-line bound" 
[20] . for < R < R CTit the function E(R,p) does not exceed that straight line. Therefore, if 
< R < -R CT it(p) and < p < 1/2 then the inequality holds 

F(Rn)<F( n n) [E(0,p)-E(R clit (p),P)]R 
E(R,p) < E(0,p) ^ . 

Now, to establish the formula ( TT3l) . it is sufficient to check that for such p, R the following 
strict inequality is valid 

n , „, , \E(0,p) - E(R clit (p),p)]R , . 

7o E ex (R lo ,p, 2 > E(0,p) - 1 1 ,P) y^MIl . 22 

-Rcrit(p) 

For that purpose it is convenient to introduce the parameter u = -R/70, u G (0,-R cr i t ). Then 
we get the parametric representation for 70 = 7o(w,p) and R = R(u,p): 

= E2W R = u 

E 2 {p) + E ex (u,p, 2) - E ex (u,p) ' 

Then combining analytical and numerical methods, it is not difficult to check validity of the 
inequality (1221) . It concludes proof of the Theorem 2. A. 

In Fig. 3 the plots of the functions F^R.p) and E cx (R,p) for p = 0.01 (R cvit « 0.387) 
are shown. 

To compare the functions F\{R,p) and E(R,p) consider 
Example 1. Let p = (1 — e) /2, e — > 0. Then 

C(P) = J + 0(e 4 ), R CTit (p) = C ^ 1 + °^\ R^ n2 {p) < R mM = {C 2 ). 

Therefore when p — > 1/2 the expurgation bound, essentially, is not applicable and we get 
the known results |J5] 



E(R,p)[l + o(l)] 

and 



C/2-R, 0<R< C/4, 

[Vc - Vr) 2 , c/a<r< c, 



r,/n nxn MKr/n f 2C/3 - 2i?, < R < C/9, 

E(R,p,2)[l + o(l)]>E r (R,p,2) = t [{ J_^ C J- Q <- R J C 
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From those formulas and we also have 



C-&R, 0<R< C/9, 

Aet {R,p)[l + o(l)]= { 3(VC-2t/R) 2 , C/9 < R < C/4, (23) 

0, C/4 < R < C. 

Consider the equation (|2"U|) . For R < -R C rit(p) = C(p)[l + o(l)]/4, there are possible two 
cases: R/ lo < C/9 and C/9 < R/^ < C/4. 
1) Let R/-f < C/9. Then from ([20]) we get 

Q(R + C) AC - WR 2C 
7o = ' F i{R,P) = j , -19' 



and 



F l (R,p)_8 4R R< 2C 



E(R,p) 7 7{C-2R) ' - 19 

The ratio F\(R, p) / E(R, p) monotonically decreases from 8/7 (for R = 0) down to 16/15 (for 
R = 2C/19). 

2) Let C/9 < R/jo < C/4. Then we get 

_ 2^R+ ^QC-8R 2C n C 

7o = f= i — < R < — , 

3VC 19 " 4 



and 



Fi(R,p) = I \6C -7R- 2^2R(3C - AR) 
9 L 



The ratio Fi(R,p)/E(R,p) monotonically decreases from 16/15 (for R = 2C/19) down to 1 
(for R = C/4). 

It is natural to expect that similar results will also hold in the case of the noisy feedback 
channel BSC(pi), if p% is sufficiently small. 

§ 4. Channel with noisy feedback. Proof of Theorem 1 

In the noisy feedback case we will still use the transmission method with one switching 
moment. But if we try to use exactly the same method as in the noiseless feedback case, 
we will face with the following problem. After phase I, the transmitter should find the two 
most probable (for the receiver) codewords x l ,x 2 . But with relatively high probability, the 
second and the third ranked codewords x 2 and x 3 will be approximately equiprobable, and 
therefore it will be difficult to the transmitter to rank them correctly (due to noise in the 
feedback). Fortunately, in that case (with high probability) the most probable codeword 
x 1 will be much more probable than x 2 , and then (again with high probability) x l is the 
true codeword. We use this observation as follows: if posterior probabilities of the second 
x 2 and the third x 3 ranked codewords are not very different, the receiver makes a decision 
immediately after phase I (in favor of the most probable codeword a? 1 ), and it ignores all 
next signals from the transmitter on phase II. 
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As a result, we use the following transmission and decoding method. 

Transmission . We set a number < 7 < 1. On phase I, of length m = 771, we use a 
"good" code (it is explained below). Let a? trU e be the transmitted codeword of length m, y be 
the received (by the receiver) block, and x' be the received (by the transmitter) block. The 
transmitter selects one more codeword Xi 7^ x tvue , closest to x' . For example, the codeword 
Xi 7^ a?truc is chosen, if d(xi,x') = min d(Xi,x'). As a result, the transmitter builds a 

list of two messages: the true one 8 true and another message 0j 7^ 9 true , which looks most 
probable among remaining ones. 

A "good" code in use of length m should have the following properties: 

1) Its decoding error probability P e satisfies the inequality P e < e~ Elov, ^ R ' pS)rn ; 

2) Its list size L = 2 decoding error probability P e (2) satisfies similar inequality P e (2) < 

e -E low (R,p,2)m. 

3) The relations (EE} and ([26]) hold for it. 

Existence of such code is shown in § 6, slightly modifying standard Gallager's arguments 
for expurgation bound [151 f!6] . 

On phase II (i.e. on (771,71]) the transmitter uses the two opposite codewords of length 
n—m = (1—7)71 (for example, consisting of all zeros and all ones), in order to help the receiver 
to decide between the true message # t rue and another most probable message 9i 7^ #true- 

This transmission method is a slight modification of the method used in [I]. It gives the 
same decoding error probability exponent, but it is simpler for analysis. If the true message 
^true is not among the two most probable messages for the receiver, then there will always 
be the decoding error. A slight modification of the transmission method from p] used here 
helps in the case when the true message # true is among the two most probable messages for 
the receiver, but it is not such one for the transmitter. 

Decoding. We set a number t > 0. Arrange the Hamming distances {d(Xi,y), i = 
1, . . . , M} after phase I in the increasing order, denoting 

d (1) = mind(x i} y) < d {2) <...< d {M) = maxd(xi, y), 

i i 

(in case of tie we use any order). Let also x 1 ,...^ 111 be the corresponding ranking of 
codewords after phase I, i.e x 1 is the closest to y codeword, etc. Two cases are possible. 

Case 1. If d {3) < d (2) + tjn, then the receiver makes the decoding immediately after 
phase I (in favor of the closest to y codeword a? 1 ). Although the transmitter will still send 
some signals on phase II, the receiver has already made its decision. 

Case 2. If d^ > d^ + tjn, then after phase I the receiver selects two most probable 
messages 9{,9j, and after transmission on phase II (i.e. after moment n) makes a decision 
between those two remaining messages 9{ , 9j in favor of more probable of them (based on all 
received on [0, rz] signals). 

In the case 2 the transmitter and the receiver will perform in coordination, if the lists of 
two messages build by each of them coincide. Remind that the receiver's list always contains 
the true message. Of course, those lists may be different (and then there will be the decoding 
error), but probability of such event should be sufficiently small (which will be secured 
below) . 
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Remarks 5. a) In the case of noiseless feedback (i.e. when px = 0) the strategy described 
reduces to the strategy from § 3 if we set t = 0. 

b) The strategy described can be improved by introducing an additional parameter r > 0, 
such that if > dS l > + r 7 ri then the receiver also makes the decoding immediately after 
phase I (in favor of the closest to y codeword x 1 ). But introduction of such parameter leads 
to too bulky formulas. 

To evaluate the decoding error probability P e , denote by Pi and P2 the decoding error 
probabilities in the case 1 (i.e. after the moment 772), and in the case 2 (i.e. after the moment 
n), respectively. Then for P e we have 

P c < Pi + P2 ■ (24) 

We evaluate the probabilities Pi, P2 in the right-hand side of ( 124"1) . Denoting di = d(xi, y), 
i = 1, . . . , M, for Pi we have 

M 

Pi < M- 1 P(4 ± d k > rf (3) - tjn\x k ). (25) 



k=l 



We show that there exists a code such that for Pi we have {n — > 00) 

I i n _L > 7 £ low (P/ 7 ,p, 2) - tl In £ + o(l) . 
n P\ 6 p 



(26) 



Indeed, using the inequality (^ai) 1//p < 5^<^ /p , p > 1, we have 

< 2 1/p P 1/p (d k = d {2) > d (3) - tjn\x k ) + 2 1/p P 1/ " (4 > d^\x k ) < 



Mp 



*7«/(3p) 



mi,m 2 



5^ [P (y\x k ) P (y\x mi )P (y\x m2 )} 



1/3 



y 



and then 



[PP 1 /" (4 ^ d^-d k > d<® - t in \x k )] p/n < 2^ n Q 



fry/3 



- 7 Eex(R/7,P,2) 



A similar inequality holds with E r (R/ / y,p,2) instead of P ex (P/7,p, 2). Therefore using the 
definition of Pi ow (P/7,p, 2) (see (J5])), we get the formula (126|) . 
For the value P2 we have 

P 2 < P 20 + An , (27) 

where P 2 q is the decoding error probability in the case 2 for the channel with noiseless 
feedback, and P^n is the probability that the most probable codeword (excluding the true 
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codeword x trnc ) for the receiver is not such one for the transmitter (moreover, the true 
codeword is among two most probable codewords for the receiver). 
For the value P20 the formula (TT8~| remains valid. 

It remains us to evaluate P2„. For that purpose consider the ensemble of codes C in which 
each codeword is selected independently with the probability 2~ m among all possible binary 
vectors of length m. We are interested in the value E^P^n p > 1, where expectation is 
taken over randomly chosen codes C. Clearly, 

m\ I p 



P (j/|astrue) = <l m y d J y-J > d = d(x tTUC ,y) . 

For given blocks x trnc and y all (M — 1) remaining codewords are independently and 
equiprobably distributed among all 2 m binary vectors of length m. The vector y is transmited 
over the feedback channel BSC(pi) and the transmitter receives the vector x' . 

Without loss of generality we assume that cctme = x m- For the received block y we arrange 
all remaining codewords xi, . . . , Xm-i as x l , . . . , a; -1 , in increasing by their distance d (x l , y) 
order, i.e. d(x x ,y) is the minimal distance, etc. In the case 2 it is necessary to have 
d(x\y) — d(x l ,y) > tin, i = 2, . . . , M — 1 (otherwise, the case 1 occurs). Moreover, we 
may assume that the distance d{x 1 ,y) satisfies the condition (m — » 00) 

d(x\y)/m < 5 GV (R/l) -t + o(l), R>0, (28) 

which is equivalent to the inequality 

h {d(x 1 , y)/m + 1} < In 2 - R/j , d(x 1 , y)/m + 1 < 1/2 . 

Indeed, blocks y,X\, . . . ,Xm-i are distributed independently and equiprobably among 
all 2 m binary vectors of length m. For w > introduce the random event 

A(u) = {d(x 1 , y) > (u — t)m; d(x 2 , y) — d(x 1 , y) > tm}. 

Then 

Af-l 



P{A(u)} < (M - 1)P {d(xx, y)> (u- t)m} JJ P {d(aji, y) > um} = 

= (M — l)P{u;(aJi) > (u - t)m}P M " 2 {w(a; 2 ) > um} < 
< (M - 1) [1 - P {w(x 2 ) < Mm}] M_2 < (M — 1) exp {— (M - 2)P {«;(aj) < m}} 

where the inequality (1 — a) b < e _ab , 6 > was used. Note that 

P{w(x) < um} > 2-' m ( m ] > 7 1 2~ m e mfc(M) , 
L w ~ 1 ~ \umj ~ (m + 1) 

since [21, cbopivryjia (12.40)] for any < k < n the inequalities hold 

1 _2nh(k/n) ^ | ) < 2 n/l ( fc / n ) 



n + 1 \A; 
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Therefore 



P{A(u)} < exp 



(Rm (M-2) 



,[R/f+h(u)-ln 2]m 



7 M(m + 1) 

We set u such that [R/j + h(u) — ln2]m > 41nm. Then for sufficiently large m we have 
P{i(u)} < e~ m , and we may neglect the event of such small probability. Therefore the 
inequality (|2"51) holds. 

Assuming that a? true = Xm, For given y, x', Xyi and randomly (equiprobably) chosen 
Xi, X2 introduce the set 

T{y, x', x M ) = {xi, x 2 : d(x 1 , y) < 5 GV {R/^)m - tm, d(x 1 , x') > d(x 2 , x') } . 

We are interested in the values P 3 = P {J-'iy, x', Xm)\Vi x', xm} and X ' ,x M Pi, s > 0. 

RemarkQ. In the definition of the set J-'iy, x', Xm) we might include additional constraints: 
d(x 2 ,y) > 5Gv{R/l) m 'i d(x 2 ,y) > d(xM,y)- But it seems that they do not improve the 
exponent of P 3 . 

Note that if d(y, y') < tm then P 2n = P 3 = 0. Moreover, if p\ < t then 



P 2n <P{d(y,x') > tm} < e - mmipi) . 
If d(y, x') > tm, then for any nonnegative a, <p 



} 



= e^-^Ea^a; a { e -«<Kx 1 ,y)+ v Wx 1 ,x')-d(x 3 ,x')] ^ ^ XM y 
For any a, b and equiprobable x 

u v d(y,x') 



E 



x 



\y,x 



2 -m ( 1 + e a+6)' 



e ad(X,y)+bd(X,X') | 

Then when d(y, cc') > tm, we have 

P 3 < 2 - 2m e ais - t)m (1 + e ^" a ) m (1 + e - *) 1 
Since EbflV' x ") = (qi+pib) m , then 



e a + e b 



a+b 



1 + e 



1 + e^- 



d(y,x') 



| E [ftW);^^') > tm ] y /m < |^ E6 d(y,aj')+M[d(l/,aJ')-t m ]J 



l/m 



Note that (6 > 1) 



min (fe-^fgi + p 1 fe 1+ ' t )} . 



min {&-"* (zi + } = e /4(fe ' i ' Pl) , 

iu>0 L v 7 J 



U(b,t,px) 



hit) + (1 - t) In Zi + t In 6, ln(tzi/(l - t)) > In b, 
In (z x + b) , ln(tzi/(l - 1)) < In 6, 



(29) 



(30) 
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where minimum is attained when 



]n(t Zl /(l-t)) 
In b 



Therefore for b\ > 1 we have 

{Py,x>,x M P() llm < 2- 2s e a{5 ~ t)s e~ {a+ ^ s+Mb i' t ' Pl) (e a + e*) s + l) s , 



where 



1 + e^ 



e a _|_ e <p 

We should minimize that expression over nonnegative a, (p. We have 



Denote 

and note that 
Then 



Ee ad(x M ,y) = ( g + pe?) 



z = ~, 
V 



Eb d(y,x>) = (gi + pib y 



Zl = — , 

Pi 



(e v -e _v ) (l-e" a ) > 0. 



(E f/l aj' ) aj M -P 3 ') 1/m < 2-Ve" [a(1 ^ + * )+ ^ ]s+/4(fe? '*' Pl) (e Q + e^) s (e v + l) s . 
We apply the random coding with expurgation method, using the inequality 
(E a i) 1/p < E a 1 /", P > !• We have 

and then from (|32|) we get (p = 1/s > 1) 



VP 



E C P 2n /P (C) < e 2^/7 2 -2pP e P/4(6!,tj» 1 )-a(l-*+t)-,p ( g a + ^ + ^ _ 

To avoid bulky formulas, we choose the parameters such that the inequality holds (see 

pln(tei/(l -*)) > ln6 x . 

Then 



EcP 2 ?(C) 



p/m 



< 2~ 2 e Gp+F2 



&i 



1 + cd 

c + d ' 



G = 2Rh + h(t) + In [pfo 1 -*] = 2i?/ 7 - D (t|| Pl ) , 
F 2 = -(1 -5 + t) lnd-lnc + tln(l + dc) + ln(l + c) + (1 - t) ln(d + c) 



and we should minimize F% over c, d > 1 . 
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Note that 02 does not depend on p. If G < then the best is p — > 00. Since 

1/ 1 /9 / m 

EcP 2 n (^) ~ ^ 0, p — > 00, we may assume that P 2n = 0. If G > then the best is p — 1 
and then it is better to use simply the random coding method). In both cases we need the 

condition f l33|) be satisfied. 

If p 00 then the inequality (1331) is equivalent to the condition tz\j{\ — t) > 1, i.e. 

Pi < £. We set i > pi such that 2P/7 - (t || pi) < 0. Then G < 0, P 2n = 0, and from (J26J) , 

(fTg]l we get 

Fi(R,p,pi) > maxmin(7P low ( J R/7,p,2)-^ln^, 7 P low (P/7,p) + (l-7)P 2 (p)i. (34) 

Using t = ti(R,pi) > pi (see ([9])) we get from ( 1541 

Fi(i?,p,Pi) > maxmin jE low (R j,p,2) In-, 

1 3 P (35) 

7Piow(P/7,P) + (l-7)£ 2 (p)' 

from which the formulas (fTUj) . (ITT)) and the Theorem 1 follow. ▲ 

Remark 7. Note that if pi — > 0, then ti — > and the relation (1351) transfers to the similar 

relation ( 1T91) for the channel with noiseless feedback. 

To find the function po(R,p) of the critical noise level in the feedback channel we set 

7—7-1. Then p = po(R,p) is defined by the system of equations 

E low (R,p,2) - ^ln- = E loVf (R,p), 
3 p 

D(t\\p ) = 2R. 

In other words, t (R,p) and p (R,p) < t (R,p) are defined by the formulas ([7]) and ([S]), 
respectively. 

§ 5. When noisy feedback behaves like noiseless ? 

How small should be pi in order to have the error exponent Fi(R,p,pi) close to the 
similar exponent Fi(R,p) for noiseless feedback ? More exactly, when for a given a G (0, 1) 
the inequality holds - E(R,p) < (1 - a)[Fi(P,p) - E(R,p)] ? 

We give a simple estimate for such pi, considering only the case i? = 0. For the optimal 
7 = 7o from (JTOJ), (HU) we have (P 2 (p) = 2.0(0, p)) 

= 2P(0,p) 

70 P(0,p,2) + P(0,p)-p!ln(g/p)/3 

and then 

2P(0,p)[P(0,p,2)-p 1 ln(g/p)/3] 



0i(O,P,Pi) = 
F 1 (p,p,p 1 )-E(p,p) 



0(0, p, 2) + 0(O,p)-p 1 ln(g/p)/3' 
0(0, p) [0(0, p, 2) - 0(0, p) - pi ln(g/p)/3] 
0(O,p,2) + 0(O,p)-p 1 ln(g/p)/3 
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Now in order to have 

Fi(0,p, Pl ) - £7(0, p) > (1 - a)[F!(0,p) - £(0,p)], 
it is sufficient to have 

3a[£7 2 (0,p,2)-£ 2 (0,p)] 
Pl ~ [aE(0, p, 2) + (2 - a) E(0, p)} ln(g/p) ' 

Since E(0, p, 2) > £7(0, p), without much loss, we may replace the last inequality by a stronger 
one: 

3a[£(0,p,2)-£7(0,p)] 

MM =Pn(P^)- 

On Fig. 4 the plot of the function Pn(p, 0.1) is given. 

Example 2. Consider the case p — (1 — e)/2, e — )• 0. Then C(p) ~ e 2 /2 and 
£7(0, p, 2) ps 2C/3, £7(0, p) ps C/2. As a result, we get 

, s a(l-2p)[l + o(l)l 
p n (p,a) = ^ ^ ^, p 1/2. 

In other words, if the forward BSC(p) is very bad, then in order to improve its error exponent 
we need a very good feedback channel BSC(pi). 

§ 6. Auxiliary formulas and results 

Lower bounds for the decoding error exponents. All formulas below are derived 
following Gallager's technique [T5| 1 16 j . 

1) Random coding bounds: 

E{R,p,L) > E r {R,p,L), R>0. (36) 
Moreover (-R C rit,L(p) onpe^ejieHO b ([1])), 

E(R,p,L) = E r (R,p,L) = E sp (R,p), £U,z(p) < R < C(p), (37) 
and for R < R cr it,L(p) we have 

E(R, p, L) > E T {R, p, L) = L(ln 2 - R) - (1 + L) In [p 1/(1+L) + g 1/(1+L) ] . (38) 

Since -R C rit,L(p) — > 0, L — > oo, then E(R,p, L) — )■ E sp (R,p), L — > oo for any i? > 0. 

2) Random coding with expurgation bound: 

E{R,p,L)>E ex {R,p,L) = m&x{-pLR-plnf{p,L,p)}, R>0, (39) 



16 



where 



/ (p,L,p) = 2-^)| 2 + X:( L + 1 )a^| 



i/(L+l) 



+ q 



v 



i/(L+l) 



Q 

The bound (139|) improves the random coding bound (138)) for < R < Rmm,L(p) (see (J42 
but it does not give E sp (R,p). Note also that 



/(p,L,p) = E Yl 



m,mi,...,m,L 



Y [ p (y\ x m) p (y\x mi ) P (l/laJmJ] 



v(i+i) 



(40) 



where all components of each codeword £Cj are chosen independently and equiprobably from 
and 1. 

In particular, 

E cx (R,p) = E cx (R,p, 1) = max |pln2 - pR - pin [l + (2^) 1/p j } , 
E ex (R,p,2) = max jp In 4- 2pi?- pin 1 + 3 (p 1/3 g 2/3 + p 2/3 g 1/3 ) 1/p }• 

The functions E(R, p, L), E r (R, p, L) and E ex (R, p, L) does not decreases on L. In particular, 

E ex (R,p) < E m (R,p,2), R < R ciit (p). (41) 

In order to get a more convenient representation for the functions E ex (R,p) and 
E ex (R,p, L), introduce rates 



R min , L (p) = In 2 - ^±^- In [pV(^D + ? VC^D] 



2L [pV(i+i) + 



iL+r 



(42) 



The function -R m m,L(p) monotonically decreases on L and i? m in,L(p) < -Rcrit, l{p), if L > 1 
and < p < 1/2. In particular, 



-Rmin(p) = -Rmin,l(p) = In 2 - /l 



2Jpq 



-Rmin,2(p) = In 2 



ln(l + 3«i) 



3ai In ai 
1 + 3ai 



1 + 2 v /pg / 



(43) 



We also have R m ^ 2 {p) < Rmm,i(p) < R C rit(p), < p < 1/2. 
Now 

E cx (R,p,L) < E r (R,p,L) = E sp (R,p) R > i^rfo), 

E ex (R,p, L) = E T (R,p, L), Rmm,L(p) < R < -Rcrit,L(p), 

E ex (R,p,L)> E t (R,p,L), 0<R< R min , L (p) . 
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Moreover 



e cx (R,p) = ln t~ > n<R< Rnanip)- ( 44 ) 

2 4pq 

Note also that < R < R m m (p) corresponds to the case 8qv{R) > {2\^M)/{^ + ^s/PQ)- 
If L = 2, the 

£ ex (R, p, 2) = -v In a x , < R < R m - m ,2 (p) , 
where a\ is defined in b (j43p . and v is the unique root of the equation 

3 

ln4-/j(V) -uln3 = 2R, 0<v<-. 

In particular, 

E ex (0,p) = E(0,p) = -In ' 



2 

E ex (0,p,2) = 25(0, p, 2) = -| In (^V 73 +P 2/3 9 1/3 ) , 



(45) 



(the second relation is established in |18|). 

Existence of code with given properties. We are interested in a code C such that 
each its codeword has certain properties A±,A2, ■ ■ ■■ For that purpose we use the following 
result which is a natural modification of the cute Lemma 5.7 from |16| . 

Assume that we choose randomly (in arbitrary way) a code C with M' codewords x m , 
and for each x m , m — 1, . . . , M' we have 

Pover codes {x m does not have property A} <l/2. (46) 

Lemma.// the condition (|4"6"]1 is satisfied then there exists a code in the ensemble of 
codes with M' = 2M — 1 codewords for which, at least, for M its codewords the property A 
is fulfilled. 

Proof remains the same as in [TBI Lemma 5.7] (it is the changing of the summation 
order in the corresponding double sum). ▲ 

If there are, say, four properties A\, . . . , A4, then assume that for each x m , m — 1, . . . , M', 
we have 

Pover codes {x m does not have property A4} < 1/8 , i = 1, . . . , 4 . (47) 

Corollary 2. If the condition ( 14"7|) is satisfied, then there exists a code in the 
ensemble of codes with M' = 2M — 1 codewords for which, at least, for M its codewords all 
four properties A4, i — 1, . . . , 4 are fulfilled. 

In our case the property Ai means that the codeword x m has small decoding error 
probability; Ai means that x m has small list size L = 2 decoding error probability; .A3, .4.4 
mean that for the codeword x m the relations (ITS]) and f l2"B"j) . respectively, hold. 

Proof of the formula ( |18j) . Consider a code C with M codewords x±, . . . , Xm of length 
n + k. Each codeword Xi has the form X{ = {x[, x"), where x\ has length n and x" has length 
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k. We suppose that the parts {x"} are given, while the parts {x^} are chosen randomly (in 
some way). We also assume that 



min d (x'J, x") = 5k . 



(48 



Using maximum likelihood decoding, denote by P e>m the conditional decoding error 
probability provided the codeword x m was transmitted. An output block y has the form 
V = (y' ' i y")i where y', y" have length n and k, respectively. Then 

P (y\x m ) = P (y'\x' m ) P (y"\x'^). Using the inequality (J2 CLiY < < s < 1, and the 

formula 

E 7 P (v'\x' m )P(y'\x' ml ) = (4pq)< x '- x '^ /2 , 

y 

we have 



p s < V 

e,m — / , 



E \Z p (y\ x m)P (y\x n 
y 



E ^/P (2/'K) P (y'KO 
2/' 



^/P(y'|<)P(y'|<,) 

y 



53 ^/p (2,"io p (<,"<': 

y" 



max (2^pq) d ( X ' L i' X ' L ^ 



< 



(49) 



(2v^)^ E 



y 



(2VM) 5sfc E ( 4 ^) 



sd(£C^,X' )/2 



Consider an ensemble of codes in which each codeword x' m is selected independently with 
the probability 2 _n among all possible binary vectors of length n. Since 



Ez d(x' m ,x' m ,) = Ez w(x' m ) 



l + z 



we get 

(EP e s ,J Vs < VVpqf {e R 2-> [1 + (2Vpqy}} n/S • 

Further derivation follows Theorem 5.7.1 from [16]. As a result, defining p = 1/s, p > 1, we 
get that there exists a code with M codewords such that for any m = 1, . . . , M we have 



1,1 Sk , 1 
In — — > — In - — — + max < p In 2 — pit — p In 



l + (2Vpg) 1/p ]}- 



n P 6jm n 2 v /pg p>i 

From that relation the formula (1181) follows. ▲ 

The authors wish to thank the University of Tokyo for supporting this joint research. 
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Fig. 2. The plot of the function Pl {R, 0.01) (R ciit « 0.387). 
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Fig. 4. The plot of the function p n (p, 0.1) 
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